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ABSTRACT 


Recentlv  IBM  Corporation  has  declassified  an  aloorithn 
for  encryption  usable  for  conputer-to-connuter  or 
computer- to- terminal  communications.  Their  algorithm  was 
implemented  in  a  hardware  device  called  Lucifer.  A  software 
implementation  of  Lucifer  for  Multics  is  described.  A  proof 
of  the  algorithm's  reversibility  for  deciphering  is 
provided.  A  special  hand-coded  (assembly  language)  version 
of  Lucifer  is  described  whose  goal  is  to  attain  performance 
an  close  as  possible  to  that  of  the  hardware  device. 
Performance  measurements  of  this  program  are  given. 
Questions  addressed  are:  IIow  complex  is  it  to  implement  an 
algorithm  in  software  designed  primarilv  for  diqital 
hardware?  Can  such  a  program  perform  well  enough  for  use  in 
the  I/O  system  of  a  large  tine-sharing  svsten? 


Author:  G.  Gordon  Benedict 

Thesis  Supervisor:  Prof.  Jerome  II.  Saltzor 
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OVERVIEW 


\ 


This  thesis  examines  the  enciphering  algorithm  recently 
released  by  IBM,  Lucifer.  This  algorithm  is  described  as  a 
hardware  mechanism  in  "The  Desiqn  of  Lucifer,  a 
Cryptographic  Device  for  Data  Communications",  by  J.  Lynn 
Smith;  this  was  the  primary  source  document. 

A  proof  of  Lucifer's  reversibility  is  given,  that  it 
will  in  fact  correctly  decipher  its  previously-output 
ciphertext  when  provided  with  the  same  key  used  for 
enciphering.  Two  software  implementations  are  described  and 
their  performance  measured. 

This  paper  is  divided  into  five  sections  and  four 
appendices.  "Introduction  to  Enciphering"  briefly  explains 
the  uses  of  enciphering  in  computer-to-computer  and 
computer- to- terminal  communication  as  a  security 

enhancement.  "Enciphering  Algorithms  and  Lucifer  in 
Particular"  lists  some  criteria  for  a  good  computer-oriented 
cipher.  The  general  operation  of  Lucifer  is  depicted 
without  much  detail.  Sufficient  detail  is  however  given  for 
understanding  of  "A  Simple  Proof  of  Lucifer's 

\ 

\ Reversibility " .  This  section  provides  an  informal  proof 

^diat  Lucifer  works  in  that  it  correctly  deciphers  its  own 
ciphertext.  "The  Multics  Software  Implementation" 
demonstrates  how  to  use  the  enciphering  programs .  The  final 
section,  "Timing  and  Conclusions",  presents  performance 
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measurements  of  a  PL/I  and  a  Multics  assembly  language 
version  of  Lucifer.  Appendix  A,  “Operation  of  the  Lucifer 
Hardware" ,  details  the  operation  of  the  hardware  device 
described  by  Smith.  Appendix  B,  "The  PL/I  Implementation", 
derails  a  software  version  in  the  PL/I  language  designed  to 
3iniUlate  closely  the  Lucifer  hardware  in  its  operation  and 
be  readable  and  exportable.  Appendix  C,  "The  Assembly 
Language  Implementation",  details  a  version  of  Lucifer 
optimized  for  execution  time.  For  those  readers  unfamiliar 
with  the  Multics  hardware,  "An  3  itroduction  to  Multics 
Assembler"  briefly  explains  those  features  of  the  Honeywell 
model  6180  processor  used  by  Lucifer. 
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INTRODUCTION  TO  ENCIPHERING 

Much  attention  has  been  paid  recently  to  connuter  and 
data  security.  Computer  security  consists  of  regulating  the 
use  of  computer  facilities  to  only  those  people  or  those 
tasks  authorized  to  use  them.  This  has  been  attempted  by 
such  mechanisms  as  passwords,  protection  rings,  and 
privileged  instructions.  Data  security  is  becoming  more 
important  with  the  advent  of  government  and  corporate 
personal-data  files.  This  problem  is  magnified  if  the 
computer  system  is  available  to  many  users  via 
telecommunications.  Given  the  above  facilities  for 
regulating  computer  facility  use,  access  contro'.  is  one 
mechanism  that  is  available  for  preventing  unauthorized 
access  to  data  files.  However,  this  mechanism  fails  when 
data  is  transmitted  over  telephone  lines,  radio  links,  or 
physical  (mail  or  courier)  shipments.  Such  communications 
are  easily  tapped  without  the  legitimite  user's  knowledge, 
except  for  the  case  of  a  courier.  Even  more  insidious  than 
the  traditional  reading  of  sensitive  data  is  the  insertion 
of  spurious  data  designed  to  confuse  or  misdirect  the 
operation  of  a  system.  One  mechanism  for  minimizing  this 
problem  is  enciphering  that  data,  which  protects  the  data 
itself  rather  than  the  medium  of  transmitting  the  data. 


Enciphering  is  a  process  whereby  transformations  are 
made  on  the  message  (cleartext)  ,  usually  on  a  bit  or 
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character  level.  If  the  algorithm  is  known  the  cipher  may 
be  breakable  by  analyzing  the  ciphertext,  particularly  if 
sample  cleartext  for  some  of  the  ciphertext  is  available. 
Since  an  enciphering  algorithm  must  he  reversible  to  be 
useful,  a  key  known  by  Loth  the  message  originator  and  the 
intended  receiver  is  also  used.  Thus  if  the  key  is 
intercepted  or  deduced  the  cipher  is  now  cracked.  The 
essence  of  successful  cryptology  is  in  devising  an 
enciphering  algorithm  which  is  not  possible  to  crack  in  the 
time-span  of  the  message's  useful  less,  and  in  keeping  the 
key  secret. 

Enciphering  helps  in  preventing  insertion  of  spurious 
^iata  to  confuse  a  computer,  as  well  as  preventing  reading  of 
secret  data.  This  is  because  a  random  message  inserted  onto 
the  communication  link  will  probably  decipher  to 
unrecognizable  garbage.  The  algorithm  implemented  in  this 
paper  is  so  constructed  that  if  one  bit  is  changed  in  a 
legitimate  enciphered  message,  the  deciphered  text  will 
almost  certainly  be  unrecognizable.  This  prevents  the  form 
of  interference  wherein  a  saboteur  records  (taps)  the 
ciphertext,  changes  some  bits  randomly  without  even 
understanding  the  message,  and  inserts  the  text  onto  the 
telephone  lines.  Unrecognizable  text  can  usually  be 
rejected  by  the  computer.  There  still  remains  the  problem 
of  the  saboteur  who  records  the  ciphertext  and  replays  it 
unchanged  later.  This  can  be  extremely  damaginn  to 
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unrepeatable  or  irreversible  processes.  A  method  of 
avoiding  this  problem  is  message  chaining,  whereby  a  part  of 
the  previous  data  exchange  is  enciphered  in  this  data 
exchange,  as  a  verification  field.  Thus  the  same  message 
replayed  tomorrow  would  contain  an  out-of-ante  verification 
field  and  be  rejected.  The  operation  of  such  a  system  is 
discussed  at  length  in  Smith's  paper. 

Enciphering  can  also  be  used  for  computer- to- terminal 
communications.  The  terminal  would  contain  a  hardware 
deciphering  module;  the  algorithm  described  here  was 
designed  with  this  purpose  in  mind.  The  user  could  have  his 
key  on  a  magnetic  card,  or  he  could  type  it  in  on  the 
terminal.  The  computer  would  contain  a  central  file  of  all 
users'  keys  and  a  software  or  hardware  version  of  the 
enciphering  module. 

Enciphering  can  add  some  security  to  online  files 
against  the  possibility  of  random  hardware  or  software 
failures  or  physical  stealing  of  backup  tapes,  disk  packs, 
etc.  Enciphering  in  this  application  merely  adds  another 
dimension  of  security. 

This  paper  details  an  enciphering  algorithm  developed 
by  Feistel  and  Smith  of  IBM  for  computer-to-terminal 
communications.  A  software  version  has  been  prepared, 


intended  to  be  used  as  part  of  the  innut/output  software  or 
the  network  interface  of  Multics.  A  command  to  encipher  and 
decipher  online  segments  has  also  been  written.  A  proof  of 
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the  algorithm* s  reversibility  is  also  given,  this  was  hinted 
at  but  not  proved  in  the  Smith  and  Feistel  papers. 
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ENCIPHERING  ALGORITHMS  AND  LUCIFEP.  IN  PARTICULAR 

There  are  several  desiderata  in  the  design  of  an 
enciphering  algorithm.  One  is  needed  which  is  easily 
implemented  in  hardware,  yet  would  provide  a  greuc  measure 
of  security  against  cryptanalysts  —  especially  against 
those  armed  with  computers  of  their  own. 

Many  traditional  algorithms  have  operated  by  performing 
one-for-one  character  substitutions  based  on  the  key.  For 
example,  the  "Vignere-Vernam"  ciphers  use  a  square  array  of 
characters.  To  encipher,  each  character  of  cleartext  is 
used  as  a  column  index  into  this  array;  the  character  of  the 
key  corresponding  to  this  character  of  cleartext  (i.e.,  the 
nth  character  of  the  key  corresponds  with  the  nth  character 
of  cleartext)  is  used  as  a  row  index.  The  character  at  the 
intersection  is  the  corresponding  ciphertext  character.  The 
key  is  repeated  as  many  times  as  necessary  to  exhaust  all 
characters  of  cleartext.  The  square  array  can  contain 
essentially  any  characters.  These  ciphers'  weakness  arise 
from  the  key  repitition  and  the  simple  substitution  of  a 
very  short  message  element  (a  character) .  Such  ciphers  are 
subject  to  frequency  analysis,  particularly  if  a  sample  of 
cleartext  is  available.  This  oversimplified  account  is 
drawn  from  "Cryptology,  the  Computer,  and  Data  Privacy"  by 
M.  B.  Girdansky. 

The  algorithm  developed  by  Smith  and  Feistel  uses  the 
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traditional  enciphering  mechanisms  of  substitution  of 
strings  and  modulo  arithmetic  on  strings.  However,  by 
repeated  cycles,  essentially  a  substitution  is  performed  on 
not  small  characters  but  128-bit  blocks.  Thus  such  methods 
as  frequency  analysis  require  computation  time  on  the  order 
of  the  lifetime  of  the  universe. 

This  algorithm,  called  Lucifer,  has  the  added 
advantages  of  simple  hardware  implementation  with 
shift-registers  and  easy  reversibility.  A  general 
description  of  the  algorithm  follows  and  then  a  proof  of 
its  reversibility. 

The  basic  transformations  used  are  one-to-one  mappings 
and  exclusive-ors  (mod-2  addition).  The  input  is  divided 
into  equal-sized  blocks;  each  block  is  processed  comoletely 
independently  of  the  others.  The  following  description 
refers  to  one  block  only.  It  is  thus  desirable  from  a 
cryptographic  point  of  view  to  use  as  large  a  block  size  as 
possible,  since  the  more  bits  which  affect  a  given  bit  of 
ciphertext,  the  harder  will  be  the  job  of  the  crvptanalvst . 
As  mentioned  before,  a  basic  weakness  in  many  ciphers  is  the 
small  block  size. 

A  block  is  broken  into  the  ton  half  and  the  botuom 
half.  Without  chanqing  the  bottom  half,  it  is  broken  into 
easily  manipulable  units  called  bytes.  Each  byte  undergoes 
one  of  two  one-to-one  transformations  depending  upon  a  bit 
oi  the  key.  This  collection  of  transformed  bytes  is 
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referred  to  as  confused  bytes,  and  the  operation  i3  referred 
to  as  confusion.  Next,  each  bit  of  the  confused  bytes  is 
modulo-2  summed  with  a  different  bit  of  the  key.  This 
operation  is  referred  to  a3  interruption.  Nov/  these  bytes 
are  inodulo-2  summed  with  the  top  half  of  the  cleartext,  the 
block  previously  unused.  This  is  called  diffusion.  The  two 
halves  are  swapped;  this  operation  is  called  interchange. 
Sixteen  such  cycles  occur.  One  complete 
confusion-interruption-diffusion  cycle  is  called  a  CID 
cycle.  The  schedule  for  accesf',ng  key  bits  is  so  arranged 
that  every  key  bit  is  used  for  both  controlling  the 
confusion  transformation  and  for  interruption.  The 
interchange  operation  occurs  on  every  cycle  except  the  last. 
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Figure  1:  Flowchart 


Figure  1  shows  a  flowchart  of  the  operator.  Thus  the 
algorithm  consists  of: 

Figure  2:  Block  Diagram 


The  only  difference  between  enciphering  and  deciphering 
is  the  order  in  which  the  key  bits  are  accessed.  Within  CID 
cycle  n  during  deciphering,  key  bits 


are  accessed  in  the 
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same  order  as  in  CID  cycle  15  -  n  in  enciphering.  These 
operations,  explained  in  general  here,  are  fullv  detailed  in 
Appendix  A  -  Operation  of  the  Lucifer  Hardware. 

This  leads  to  a  simple  proof  of  reversibility,  as 
explained  in  the  next  section. 


A 
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A  PROOF  OF  LUCIFER'S  REVERSIBILITY 

Assume  there  are  n  +  1  CID  cycles  and  thus  n 
interchanges.  Call  output  of  the  CID  eye]  e  n  -  1  MO  ||  MJ 

(where  MO  is  the  first  half  of  the  message,  Ml  is  the  second 
half).  Call  the  output  of  cycle  n  CO 1 1  Cl.  The  double 
vertical  bar  represents  concatenation.  M0||  Ml  is 

transformed  in  the  following  manner  by  cycle  n,  which  is  the 
last  cycle  (the  first  is  numbered  0).  Confusion:  A 
transformation  V  (Ml)  is  applied.  Which  transformation 
depends  on  a  bit  of  the  key  (one  for  each  byte  of  Ml)  but 
since  the  same  key  bits  will  be  accessed  for  the  same  byte 
positions  during  deciphering  the  specific  transformations 
selected  i3  irrelevent,  as  long  as  they  are  all  one-to-one. 
Interruption:  T  (Ml)  is  exclusive-ored  with  specific  key 
bits  KI.  Diffusion:  T  (Ml)  +  KI  is  exclusive-ored  with  the 
top  half.  The  total  message  is  thus  T  (Ml)  +  KI  +  MO  1 1  Ml. 
Remember  that  on  cycle  n  no  interchange  occurs.  On 

deciphering,  this  output  will  be  fed  into  decipher  cycle  0, 
which  is  the  same  as  encipher  cycle  n.  Since  this  cycle  is 
exactly  the  same  as  the  last  encipher  cycle,  confusion  and 
interruption  will  generate  T  (Ml)  +  KI  just  as  before.  When 
this  is  exclusive-ored  with  the  top  half  consistinq  of  T 
(Ml)  +  KI  +  MO  the  original  MO  will  be  regenerated. 

Since  the  interchange  before  encipher  cycle  n  occurs 
after  decipher  cycle  0,  the  output  from  the  intorchanqo  will 
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also  match.  Thus  the  entire  n  -  1  interchange  and  n  CID  for 
encipher  is  equivalent  to  the  0  CID  and  0  interchange. 
Thus  these  cycles  can  now  be  effectively  stripped  off,  the 
same  proof  is  applied  to  a  Lucifer  consisting  of  n  CID 
cycles  and  n  -  1  interchanges.  Eventually  a  Lucifer  of  onn 
CID  cycle  and  zero  interchanges  remain;  this  has  already 
been  demonstrated  above  to  be  reversible. 

In  the  actual  specific  operation  of  Lucifer,  the 
diffusion  operation  does  not  consist  of  a  simple 
exclusive-or;  instead  the  bits  are  permuted  in  a  fixed 
fashion  before  diffusion.  This  does  not  affect  the 
reversibility,  since  the  ciphertext  will  undergo  the  same 
permutation  and  thus  each  cycle  will  regenerate  the  input  of 
the  corresponding  encipher  cycle.  However,  this  permutation 
is  necessary  for  the  cipher  to  be  difficult  to  break.  It 
ensures  that  small  differences,  say  a  one-bit  change,  in  a 
given  message  block  will  propaqate  throughout  all  the  bits 
of  that  block  of  ciphertext.  Each  bit  of  cleartext 
potentially  affects  every  bit  of  ciphertext,  within  a 
128-bit  block. 
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THE  MULTICS  SOFTWARE  IMPLEMENTATION 


Two  programs  were  written  as  implementations  of  the  IDM 
hardware  versions  of  Lucifer.  One  is  a  straightforward  PL/I 
program  which  manipulates  the  bits  in  essentially  the  same 
fashion  the  hardware  does.  The  other  is  a  Multics  assembly 
language  program  optimized  for  speed  of  execution.  Details 
and  listings  of  each  may  be  found  in  the  appendices. 
Instructions  on  using  them  are  given  here. 

First,  a  key  must  be  supplied.  This  is  done  by  calling 
the  set_key  entry: 

declare  lucifer_$set_key  entry  (bit  (128)); 

call  lucifer_$set_key  (key) ; 

This  entry  saves  the  key  in  internal  static.  This  key 
will  be  used  for  all  future  enciphering  and  deciphering 
until  set_key  is  called  again. 

To  encipher: 

declare  lucifer_$encipher  entry  (dimension  (*) 
bit  (128) ,  dimension  {*)  bit  (128)  ,  fixed  binary  precision 
(35))  ; 

call  lucifer_$encipher  (cleartext,  ciphertext, 

code ) ; 

The  packed  bit  array,  cleartext,  is  enciphered  and 
deposited  in  the  equal-sized  array  ciphertext.  The  code 
argument  will  be  set  to  zero  unless  the  dimensions  of 
cleartext  and  ciphertext  do  not  agree,  in  which  case  code 
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will  be  set  to  one  and  the  enciphering  not  performed.  The 
ciphertext  and  cleartext  may  be  the  same  variable. 

To  deciphers 

call  lucifer_$decipher  (ciphertext,  cleartext, 

code)  ; 

This  entry  is  declared  the  same  as  encipher,  and  its 
operation  is  similar. 

One  problem  with  this  implementation  is  that  Lucifer 
requires  a  128-bit  block  to  encipher  each  128-bit  block  of 
the  cleartext.  If  the  cleartext  is  not  a  multiple  of  123 
bits  the  last  block  could  be  padded  with  zeroes,  but  the 
output  ciphertext  corresponding  to  this  block  cannot  be 
truncated.  If  it  is  information  will  be  lost  and  it  will 
not  be  deciphered  correctly.  This  is  because  on  decipher 
the  truncated  block  will  be  padded  to  128  bits  (with  zeroes, 
presumably)  which  is  not  identical  to  the  original  output  of 
encipher  before  truncation.  Therefore  the  primitive 
subroutines  lucif er_$encipher  and  lucifer_$decinher  require 
data  to  be  passed  in  128-bit  blocks. 

To  make  this  more  palatable  to  Multics  users  (to  whom 
data  tends  to  come  in  multiples  of  9-bit  characters  or 
36-bit  words  anyway)  a  command  has  been  written  to  translate 
an  entire  segment.  To  set  the  key,  type: 
set  key  -key- 


wliere  -key-  will  be  padded  or  truncated  to  128  bits  and  is 
an  octal  striny. 
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To  encipher  a  segment,  type: 

encipher  -cleartext-  -ciphertext- 
The  segment  whose  relative  pathname  is  -cleartext-  will  be 
enciphered.  If  the  optional  argument  ciphertext-  is  not 
given  the  original  segment  will  be  overwritten;  otherwise 
the  ciphertext  will  be  written  onto  the  segment  named 
-ciphertex t- . 

The  input  will  be  padded  to  a  mod  128  bit  length  with 
zeroes,  and  the  output  segment  will  be  equal  in  length. 
Note  that  no  additional  pages  can  ever  be  required  by  this 
padding,  since  a  page  is  36*1024  bits  long,  a  multiple  of 
128. 

To  decipher,  type: 

decipher  -ciphertext-  -cleartext- 
This  command  operates  in  the  same  way  as  encipher.  Since 

the  ciphertext  segment  must  be  a  multiple  of  128  bits  long, 
exactly  as  produced  by  encipher,  the  output  deciphered  text 
will  be  exactly  as  long.  This  is  because  decipher  has  no 
way  of  knowing  how  long  the  original  was.  This  can  damage 
standard  object  segments  which  have  significant  words 
expected  to  be  found  at  the  end  of  the  seqment.  Note  that  a 
better  version  of  this  command  would  encipher  the  original 
cleartext  length  into  the  ciphertext  segment. 


At  Enciphering  Module  for  Multics 


page  22 


TIMING  MEASUREMENTS  AND  CONCLUSIONS 

One  of  the  important  questions  addressed  by  this  paper 
is  "Is  it  possible  to  take  an  algorithm  designed  for  easy 
hardware  implementation  and  efficiently  ranslatc  it  to 
software?".  Performance  measurements  by  Feistel  slow  that 
the  Lucifer  hardware  module  enciphered  a  128-bit  block  in 
about  165  microseconds.  A  version  written  in  360  assembly 
langugage  for  the  360/67  inquired  about  9  milliseconds.  The 
current  Multics  hardware,  tho  Honeywell  model  6180,  executes 
instructions  at  approximately  the  same  rate  as  the  IBM 
360/67.  The  PL/I  version,  as  expected,  was  extremely  slow 
and  required  10.4  seconds  to  encipher  72  blocks  of  128  bits 
each,  or  144  milliseconds/block .  The  assembly  language 

version  required  .4  seconds/72  blocks,  or  5.5 
milliseconds/block.  Multiplying  by  ten  the  number  of  blocks 
passed  to  lucifer_  did  not  substantially  reduce  the 
time /block,  suggesting  that  5.5  milliseconds  represents  real 
computation  and  not  overhead.  Since  Multics  characters  are 
nine  bits  long,  Lucifer  requires  5.5  *  (9/128)  =  390 

microseconds  per  character  enciphered.  Currently  tie 

Multics  I/O  system  requires  about  100  microseconds  per 
character  for  its  processing;  thus  if  Lucifer  were  used  for 
all  I/O  a  severe  performance  degradation  could  occur. 
However  this  speed  probably  suffices  for  the  occasional  use 
to  which  it  might  be  put. 
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There  are  some  possibilities  for  further  speed-un  of 
the  assembly  language  version;  this  is  discussed  in 


Appendix 
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APPENDIX  A  -  OPERATION  OF  THE  LUCIFER  HARDWARE 

This  appendix  explains  the  details  of  the  operation  of 
Lucifer  as  it  was  originally  designed,  as  a  hardware  device. 
This  material  is  drawn  from  J.  Lynn  Smith's  "The  Design  of 
Lucifer,  a  Cryptographic  Device  for  Data  Communications". 

A  copy  of  the  PL/I  program  which  implements  the 
algorithm,  duplicating  very  closely  the  exact  bit  flows 
within  the  hardware,  is  shown  and  explained  in  Appendix  B. 

Several  cautions  must  be  made  in  reading  the  hardware 
diagram  given  in  figure  4.  Individual  bits  of  a  given  byte 
are  arrayed  vertically  across  registers;  bytes  are  numbered 
right-to-lef t ,  bits  of  a  byte  top-to-bottom.  Thus  each 
vertical  column  below  represents  one  byte  of  eight  bits. 
Therefore  if  the  bytes  are  adjacent  (0,  1,  2... etc)  the 
storage  order  in  memory  (in  a  two-dimensional  array)  is 
according  to  the  ordered  .-.airs  in  each  bit  position  shown 


below. 
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Figure  4:  Hardware  Schematic 

yaist93y  Ni)^ 


V 

t-u 


*tc  liTe/ci 


An  Enciphering  Module  for  Multics 


page  27 


Note  also  that  the  author  assumed  that  high-order  hits 
are  transmitted  first;  the  Smith  paper  does  not  specify 
this.  Thus  bits  are  first  loaded  into  position  0  of  the 
convolution  registers  (top  half),  then  position  1,  2  etc.  on 
to  position  0  of  the  source  registers  (bottom  half)  . 

Each  of  the  registers  shown  is  connected  as  a  circular 
shift-register.  In  addition,  bits  can  be  shifted  from  the 
convolution  registers  to  the  source  registers  and  back  for 
the  interchange  operation. 

A  complete  enciphering  or  deciphering  operation  for  one 
128-bit  block  consists  of  sixteen 
confusion-interruption-diffusion  (CID)  cycles,  with  ai 
interchange  cycle  in  between  each  CID  cvcle  for  a  total  of 
15  interchange  cycles. 

At  the  start  of  a  CID  cycle,  byte  0  of  the  key  is 
copied  into  the  transformation-control  register.  This 
register  will  supply  eight  bits  for  controlling  the 
confusion  operation;  each  bit  will  correspond  with  one  byte 
of  the  source  registers. 

A  CID  cycle  consists  of  eight  shifts  of  the  source, 
convolution,  and  transformation-control  register  (TCR) .  The 
TCR  shifts  vertically  upward;  other  registers  rotate 
horizontally,  byte  n  going  to  byte  mod  (n  -  1,  8) . 

An  individual  shift  of  a  CID  cycle  occurs  as  follows. 
Byte  0  is  taken  from  the  source  registers.  It  flows  into 
the  confusion  box  along  with  bit  0  of  the  TCR .  A  one-to-one 
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transformation  is  applied  to  this  byte,  according  to  the  bit 
from  the  TCR.  The  output  from  the  confusion  box  is  an 
eight-bit  confused  byte.  Each  bit  of  the  confused  byte  is 
exclusive-ored  with  some  bit  of  the  convolution  registers; 
note  that  no  two  bit  positions  are  in  the  sane  byte.  Each 
of  these  result  bits  is  exclusive-ored  with  some  bit  of  the 
rightmost  byte  of  the  key;  this  constitutes  the  interruption 
function.  The  result  of  this  operation  is  stored  in  the  bit 
position  of  the  convolution  registers  to  the  right  of  the 
pair  of  exclusive-or  gates.  Note  that  diffusion  occurs 
before  interruption,  but  this  is  immaterial  since  mod  2 
addition  is  commutative.  As  the  result  bit  is  stored  in  the 
convolution  registers,  the  convolution  registers,  source 
registers,  and  TCR  undergo  a  shift.  Thus  the  bit  that 
previously  was  to  the  right  of  the  exclusive-or  gates  in  the 
convolution  registers  is  not  destroyed;  it  is  shifted  riqht, 
and  the  result  of  diffusion  occupies  its  old  position. 

These  shifts  are  executed  eight  times  for  each  CID 
cycle.  In  addition,  during  each  shift  the  16-byte  key 
registers  each  rotate  right  one  position  with  one  exceptions 
during  the  last  shift  of  each  CID  cycle  the  kev  reqister  is 
not  rotated  during  encipher;  during  decipher  the  key 
registers  rotate  two  positions  after  the  last  shift.  Thus 
seven  key  shifts  occur  per  CID  cycle  on  encipher  and  nine 
key  shifts  occur  per  CID  cycle  on  decipher.  This,  coupled 
with  an  initial  shift  of  nine  positions  before  processing 
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any  blocks,  constitutes  the  only  difference  between 
enciphering  and  deciphering. 

\  Wien  eight  shifts  of  one  CID  cycle  are  complete,  the 
source  registers  will  be  back  to  their  original  position. 
The  convolution  registers  are  also  restored  except  that  each 
of  its  64  bits  has  been  exclusive-ored  with  exactly  one  key 
bit  exclusive-ored  with  exactly  one  source  bit.  This  is 
guaranteed  by  the  placing  of  the  gates  in  a  different  byte 
position  for  each  bit  of  the  confused  byte.  The  key 
registers  have  been  rotated  either  seven  times  (for 
encipher)  or  nine  times  (for  decipher).  The  TCR  has  yielded 
all  its  bits.  An  interchange  cycle  now  occurs,  unless  this 
is  the  last  CID  cycle.  This  consists  of  connecting 
positions  0  and  7  of  the  source  registers  with  positions  7 
and  0  of  the  convolution  registers,  respectively;  eioht 
shifts  now  occur.  This  merely  swaps  the  contents  of  the 
registers . 

Now  the  next  CID  cycle  begins.  A  new  key  byte  is 
fetched  into  the  TCR.  On  CID  cycle  1  this  will  be  byte  7 
for  encipher  and  byte  2  for  decipher  of  the  original  key. 

It  is  important  that  the  key  bits  be  accessed  in  the 
reverse  order  (between  CID  cycles)  when  deciphering  as 
compared  to  enciphering,  but  in  the  same  order  within  each 
CID  cycle.  This  is  to  ensure  reversibility,  as  explained 
earlier.  In  addition,  for  cryptographic  strength  each  bit 
of  the  key  should  be  accessed  ar  equal  number  of  times: 
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eight  tines  for  interruption  and  once  for  transformation 
control  of  one  byte  of  the  'source  registers.  The  following 
method  of  accessing  key  bytes  was  thus  devised.  If  there  is 
to  be  an  encipher,  the  key  is  initialized  by  loading  it  into 
the  key  registers.  If  a  decipher  is  to  be  performed,  the 
key  registers  are  then  rotated  so  that  the  first  CID  cycle 
will  use  bytes  9  to  0  rather  than  0  to  7.  After  each  CID 
cycle  there  will  be  no  key  shifts  on  encipher,  but  there 
will  be  two  3hifts  during  decipher.  This  will  cause  the  key 
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bytes  to  be  accessed  as  shown  in  table  1. 


Table  1:  Key  Byte  Access  Schedule 


CID  cycle 

encipher 

decipher 

0 

0 

1 

2 

3 

4 

5 

6 

7 

9 

10 

11 

12 

13 

14 

15 

0 

1 

7 

8 

9 

10 

11 

12 

13 

14 

2 

3 

4 

5 

6 

7 

8 

9 

2 

14 

15 

0 

1 

2 

3 

4 

5 

11 

12 

13 

14 

15 

0 

1 

2 

3 

5 

6 

7 

8 

9 

10 

11 

12 

4 

5 

6 

7 

8 

9 

10 

11 

4 

12 

13 

14 

15 

0 

1 

2 

3 

13 

14 

15 

0 

1 

2 

3 

4 

5 

3 

4 

5 

6 

7 

8 

9 

10 

6 

7 

8 

9 

10 

11 

12 

13 

6 

10 

11 
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14 

15 
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15 

0 

1 

2 
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5 

6 
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6 

10 

11 
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14 

15 
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7 

8 
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3 
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5 

6 
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8 

9 

10 

11 

13 

14 

15 
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1 

2 

3 

4 

12 

13 

14 

15 
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2 

3 

12 

4 

5 

6 

7 

8 

9 

10 

11 

5 

6 

7 

8 

9 

10 

11 

12 

13 

11 

12 

13 

14 

15 

0 

1 

2 

14 

15 

0 

1 

2 

3 
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14 

2 

3 

4 

5 

6 

7 

8 

9 

7 

8 

9 

10 

11 

12 

13 

14 

15 

9 

10 

11 

12 

13 

14 

15 

0 

0 

1 

2 

3 

4 

5 

6 

7 

The 

byte  of 

;  the  ! 

key 

used 

for 

transformation 

control 

is 

in  tiie  left-hand  column.  Note  that  the  decipher  schedule  is 
the  same  as  the  encipher  schedule  read  upsidedown,  but 
within  a  CID  cycle,  read  horizontally,  bytes  are  accessed  in 
the  same  order.  Also  note  that  the  key  registers  will  be  so 
positioned  after  sixteen  CID  cycles  ready  for  the  next 
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block:  in  byte  0  for  encipher,  byte  9  for  decipher. 

The  exact  nature  of  the  confusion  operation  has  not 
been  explained  yet.  It  is  not  important  particularly  v/hat 
it  is,  as  long  as  it  is  one-to-cne  and  sufficiently  random. 
It  works  as  follows.  Each  byte  to  be  confused  (from  the 
source  registers)  is  split  into  two  four-bit  halves.  If  the 
key  bit  from  the  TCR  for  this  byte  is  1,  the  two  halves  are 
exchanged;  otherwise  no  operation  is  performed.  Next,  each 
four-bit  half  undergoes  a  one-to-one  mappinq.  The  method  in 
hardware  used  decoders,  encoders,  and  permuted  wires,  but 
effectively  a  table  look-up  was  done  to  associate  with  each 
of  the  sixteen  bit  combinations  a  unique  four-bit 
replacement.  The  two  mappinqs  for  the  two  halves  are 
different;  the  one  for  the  top  half  is  cahe  j  SO  and  the  one 
for  the  bottom  half  is  SI.  Finally  an  8-bit  byte  is 
generated  by  permuting  the  eight  wires  from  these  two 
mapping  networks.  The  result  of  this  entire  confusion 
operation  (and  the  way  it  is  done  in  the  software  versions) 
is  to  consider  the  key  bit  concatenated  with  the  source  byte 
as  a  nine-bit  index  into  a  512  element  table.  Each  element 
is  an  eight-bit  confused  byte.  This  is  explained  in 
Appendix  B,  the  PL/I  implementation. 
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Table  2:  Four-bit  Permutations 


input 

so 

SI 

0000 

1100 

0111 

0001 

1111 

0010 

0010 

0111 

1110 

0011 

1010 

1001 

0100 

1110 

0011 

0101 

1101 

1011 

0110 

1011 

0000 

0111 

0000 

0100 

1000 

0010 

1100 

1001 

0110 

1101 

1010 

0011 

0001 

1011 

0001 

1010 

1100 

1001 

0110 

1101 

0100 

1111 

1110 

0101 

1000 

1111 

1000 

0101 
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APPENDIX  D  -  THE  PL/I  IMPLEMENTATION 

The  PL/I  implementation  is  very  similar  to  the  hardware 
design.  However,  instead  of  rotating  data  toward  the  low 
address  end  of  each  register,  index  values  into  fixed  arrays 
are  decremented  and  wrapped  around  to  the  high  order  end. 
Note  very  carefully  that  each  byte  shown  in  the  hardware 
diagram,  those  bits  arrayed  vertically,  are  rows  of 
two-dimensional  arrays.  Thus  ..f  a  conventional  PL/I  array 
is  printed  it  will  appear  transposed  as  compared  to  the  map 
of  the  registers.  For  consistency  within  this  document  all 
arrays  will  be  transposed  from  the  conventional  order  so 
that  they  appear  identical  to  the  hardware  bit  orderings. 

Instead  of  doing  15  interchanges  (unlike  most  other 
operations,  a  real  movement  of  data  occurs  on  interchange) 
16  are  done.  This  last  interchange  is  undone  by  copying  the 
source  registers  first  into  the  result  block  followed  by  the 
convolution  registers.  This  is  to  avoid  checking  within  the 
loop  for  the  special  case  of  the  last  execution.  Similarly 
rather  than  skipping  a  key-shift  cycle  on  encipher  and 
performing  an  extra  one  on  decipher  each  CID  cycle,  eight 
increments  of  the  key  index  interruption_row  are  always 
performed.  After  a  CID  cycle  is  complete,  a  fixup  variable 
either  one  or  minus_one  is  added  modulo  16  to 
interruption  row;  this  variable  is  -1  for  encipher  and  1  for 
decipher. 
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The  program  operates  as  follows.  It  copies  the  first 

hdlf  of  a  9iven  128-bit  block  into  the 
convolution_registers;  the  second  half  is  copied  into 
source_registers.  The  interchange_index  loop  counts  the 
CID- interchange  cycles,  sixteen  in  number.  Within  that  loon 
do  cycle  is  performed  by  assigning  inter  run  tion_row  to 
kS— row;  interruption_row  shows  which  byte  of  the  key  will 
next  be  used  for  interruption,  ks_row  shows  which  byte  will 
be  used  for  transformation  control.  This  assignment  is  the 
equivalent  of  copying  the  next  byte  of  the  key  into  the  TCR 
at  the  start  of  a  CID  cycle.  Now  the  data_row  loops  eight 
times,  once  for  each  byte  in  source_registers .  The  entire 
confusion  operation  is  implemented  by  a  512  byte  fable;  the 
first  half  for  key  bit  =  0 ,  the  second  half  for  key  bit  =  1. 
Thus  the  confused  byte  is  found  by  indexing  this  table  with 
the  key  bit  identified  by  ks_row  and  data_row  concatenated 
with  the  source  byte  identified  by  data_row.  Now 
convolution^ index  loops  eight  times,  once  for  each  bit  in 
the  confused  byte.  Note  that  this  is  all  done  in  parallel 
in  the  hardware  version  and  in  the  assembly  language  version 
described  in  Appendix  C.  Each  bit  of  the  confused  byte  must 
be  exclusive-ored  with  some  bit  of  the  key  byte  identified 
by  i n t e r r up t ion_row .  Just  as  the  key  interruption  wires 
were  permuted  in  the  hardware,  so  key_table  tells  which  bit 
of  that  key  byte  is  supplied  for  each  bit  of  the  confused 
byte.  This  interrupted  bit  is  now  exclusive-ored  with  some 
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bit  of  the  convolution  registers.  The  register  in  which  the 
bit  lies  which  will  be  diffused  (the  one  to  the  right  of  the 
exclusive-or  gates)  i3  the  one  corresponding  to  the  source 
register  from  which  the  interrupted  bit  was  derived.  The 
number  of  this  register,  the  column  in  the  PL/I  sense 
(although  it  is  horizontal  on  the  diagrams)  is  therefore 
convolution_index.  The  byte  in  which  this  bit  lies  is  given 
by  a  table,  convolution_table .  These  positions  rotate  right 
around  the  registers,  one  position  for  each  shift  of  the  CID 
cycle,  once  for  each  incrementing  of  data_row.  Therefore 
the  correct  convolution_table  entry  for  this  bit  of  the 
interrupted  byte  must  be  mod- 8  summed  with  data_row;  this 
supplies  the  byte  or  row  number  of  the  target  bit. 

After  this  byte  is  complete,  interruption_row  is 
incremented  mod  16  to  simulate  rotating  the  key  registers 
once  to  the  right.  Now  data_row  is  incremented  to  have  the 
effect  of  rotating  the  source,  convolution,  and 
transformation-control  registers . 

After  the  eight  loops  of  data_row,  interruption_row 
must  be  readjusted  to  simulate  only  seven  key  shifts  on 
encipher  but  nine  shifts  on  decipher.  As  explained  before, 
a  fixup  variable  either_one_or_minus_one  is  mod  16  addad  to 
interruption_row;  this  fixup  variable  is  set  at  the  entry 
points.  The  two  entry  points  also  set  the  initial 
interruption_row,  either  0  for  encipher  or  9  for  decipher. 

After  sixteen  loops  of  interchange  index,  sixteen 
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CID- interchange  pairs  have  been  performed.  The  block  is  now 

copied  into  the  result  field,  the  source  registers  are 

copied  first  to  undo  the  effect  of  the  extra  interchange 
cycle. 
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APPENDIX  C  -  THE  ASSEMBLY  LANGUAGE  IMPLEMENTATION 

The  basic  philosophy  of  the  Multics  assembly  lanquaqe 
version  of  Lucifer  was  to  produce  a  program  which  could 
encipher  or  decipher  at  the  highest  speed.  This  does  not 
contribute  to  the  readibility  of  the  program;  therefore  this 
explanation  is  quite  detailed.  If  the  reader  is  unfamiliar 
with  Multics  assembly  lanquage,  a  short  introduction  is 
given  in  Appendix  D. 

The  set_key  entry  does  more  than  store  the  key  in 
internal  static.  During  ciphering  the  key  is  used  in  two 
places:  transformation  control  and  interruption.  For 
reasons  explained  later,  each  purpose  requires  the  key  to  be 
in  a  different  format  for  optimal  operation.  To  avoid  key 
manipulation  during  ciphering,  set_key  stores  the  key  in  two 
variables,  key  and  exploded_kev . 

In  exploded_key  each  bit  of  the  key  is  given  its  own 
nine-bit  byte.  The  high-order  bit  of  each  byte  contains  the 
key  bit;  the  low  order  eight  bits  are  zero.  This  key  is  for 
transformation  control.  In  the  diagram  below  showing  the 
storage  assignment,  the  ordered  pair  in  each  byte  position 
gives  the  byte  of  the  key  number  and  the  bit  within  the 
byte.  As  in  the  hardware  diagrams  adjacent  bits  of  a  byte 
are  arrayed  vertically,  although  it  is  more  conventional  to 
show  memory  words  horizontally.  Thus  each  byte  of  the  key 
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requires  two  words;  thirty- two  words  for  128  bits. 

Figure  5;  Exploded  Key  Bit  Assignment 
30  28  26  24  22  20  18  16  14  12  10  8  6  4  2 
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For  interruption,  the  key  bits  within  a  key  byte  are 
not  accessed  in  the  same  order  as  the  confused  byte's  bits, 
0,  1,  2... 7.  Rather  they  are  accessed  2,  5,  4,  0,  3,  1,  7, 
6  as  given  in  key_table  of  the  PL/I  program  or  as  shown  by 
the  wiring  of  the  hardware.  To  avoid  the  use  of  such  a 
table  and  lookup  time  during  ciphering,  the  key  bytes  are 
presorted  by  set_key.  Each  3-bit  byte  of  the  key  is  stored 
in  the  high  order  part  of  a  .Multics  9-bit  byte,  the 
remaining  bit  being  zero.  Thus  the  storage  assignment  is  as 
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shown  in  the  diagram  below. 


Figure  6:  Key  Bit  Assignment 


4 

0 

12 

8 

4 

0 

5 

1 

13 

9 

5 

1 

2 

14 

10 

6 

2 

lI 

3 

15 

11 

7 

3 

0 


0 

1 

2 

3 


Words  0  and  1  are  copied  into  words  4  and  5.  This  is 
to  permit  directly  addressing  eight  bytes  starting  at  any 
byte  between  0  and  15  without  programming  a  complicated 
wraparound  routine. 

Tlie  basic  idea  underlying  this  program  is  to  process 
all  64  bits  of  the  source  and  convolution  registers  at  once, 
each  CID  cycle.  In  order  to  do  this,  the  key  bits  must  be 
so  arranged  that  each  of  its  bits  lies  in  the  bit  position 
corresponding  to  that  of  the  source  register  bit  with  which 
it  will  be  exclusi ve-ored  during  interruotion.  This 
explains  the  rearranging  above. 

When  the  encipher  entry  is  called,  it  sets 
interruption_row  (held  in  index  register  2)  to  zero  as  in 
tlie  PL/I  program.  Since  an  entire  CID  cycle  is  done  in 
parallel,  interruption_row  will  never  be  incremented  along 
tlie  horizontal  line  of  the  key  byte  access  schedule  given 
earlier.  Instead  it  will  be  incremented  each  CID  cycle  to 
assume  the  values  given  in  the  schedule's  left-hand  column, 
examining  the  schedule  it  can  be  seen  that  interruption  row 
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should  thus  be  incremented  by  7  for  encipher  and  -7  for 
decipher,  modulo  16.  Thus  each  entry  also  sets  the  variable 
either_7_or_minus_7  to  the  appropriate  value.  This  is  added 
to  x2  mod  16  each  CID  cycle. 

After  the  argument  extents  are  calculated  and  pointers 
to  the  strings  fetched  (bp  -)  input  string,  bb  -^>  output 
string) ,  the  main  loop  is  entered. 

As  in  the  PL/I  program,  the  first  64  bits  of  each 
128-bit  block  are  placed  into  convolution_registers ,  the 
next  64  into  source_regis ters .  As  with  the  key,  each  8-bit 
byte  is  placed  in  the  high  order  eight  bits  of  a  Multics 
9-bit  byte.  This  unpacking  is  accomplished  by  unpack_loop. 
This  loop  depends  on  the  fact  that  the  assembler  will  assign 
source_registers  a  location  after  convolution  registers 
because  it  is  declared  afterward.  The  low  order  (high 
address)  bytes  are  unpacked  first. 

Once  this  is  complete,  sixteen  CID- interchange  pairs 
are  executed. 

First,  the  convolution  registers  are  prepared  for  the 
diffusion  operation.  Referring  to  the  hardware  diagram,  one 
can  see  that  each  bit  of  a  confused,  interrupted  byte 
(vertically  arrayed)  corresponds  to  a  different  byte  but  the 
same  bit  (i.e.,  horizontal  register)  of  the  convolution 
registers.  As  seen  in  the  PL/I  program,  if  a  source 
register  bit  has  address  [i,  j]  (byte  i,  bit  j)  the 
convolution  register  bit  corresponding  to  it  is 
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[nod  (i  +  con volution_t able  [j],  8),  j] 
where  convolution_table  is  [7,  6,  2,  1,  5,  0,  3,  4], 

Instead  of  looping  through  each  bit  as  the  PL/I  proqrnm 
does,  the  convolution  registers  are  rotated  so  the  bit 
positions  for  diffusions  line  up,  corresponding  with  those 
of  the  source  registers. 

Since  the  horizontal  registers  are  the  bits  to  rotate, 
the  bits  to  rotate  are  not  adjacent.  Thus  the  bit  addresses 
within  the  two-word  convolution_regis ters  of  each  bit  before 
rotation  is  as  follows: 

Figure  7:  Convolution  Registers 
7  6  5  4  3  2  1  0 
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Notice  that  bits  8,  17,  2G...  71  do  not  appear  assigned 
on  the  matrix.  This  is  due  to  the  unpacking  of  each  8-bit 
byte  to  a  9-bit  byte.  The  unassigned  offsets  are  those  of 
the  pad  bits.  The  purpose  of  this  rotation  is  to  align 
all  the  exclusive-or  positions  on  the  right  edge  of  the 
matrix.  Looking  at  the  hardware  schematic,  the  desired 
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position  of  each  bit  is  as  follows: 

Figure  8:  Postrotation  Convolution  Registers 


0,  9,  18...  63)  must  be  rotated  right  on  the  diagram  (left 
in  the  AQ  register  as  it  happens)  seven  positions  or  63 
bits.  Row  1  (bits  1,  10,  19...  64)  must  be  rotated  6 
positions  or  54  bits,  etc.  An  array  of  masks,  and_masks, 
has  been  prepared  with  a  1-bit  in  each  bit  position  for  a 
given  register.  They  are  ordered  according  to  the  number  of 
positions  of  rotation  needed.  Since  register  5  needs  no 
rotation  (because  the  exclusivc-or  gate  is  already  in  byte 
0) ,  the  mask  for  it  occurs  first.  It  consists  of  four 
zeroes,  a  one,  eight  zeroes,  a  one,  eight  zeroes...  Thus, 
when  convolution_registers  is  loaded  into  the  AQ  register 
and  is  ANDed  with  this  mask,  only  bits  5,  14,  23...  63  will 
remain.  This  register  is  rotated  0  bits  left  and  then  ORed 
into  a  previously  zeroed  doubleword,  named  "normalized". 
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.text,  register  3  must  be  rotated  left  one  position  or  nine 

i 

bits.  Thus  the  second  mask  has  a  one  in  bit  3  and  a  one 
every  nine  bits  thereafter.  After  ANDing  the 
convolution_registers  with  this  mask  only  bits  3,  12,  21... 

66  remain.  The  AQ  is  rotated  left  nine  bits,  and  ORed  into 
"normalized" . 

There  is  a  pointer  to  and_masks  called  and_masks_ptr. 
It  is  referenced  by  using  the  add-delta  (AD)  type  indirect 
reference.  When  an  indirect  reference  is  made  through  this 
word,  after  completion  of  the  specified  oneration  the 
contents  of  the  delta  field  (here  2)  will  be  added  to  the 
address  field.  Thus  the  next  time  the  AQ  is  ANDed  the  next 
doubleword  mask  will  be  used.  Similarly  an  AD  word  controls 
the  shift  count.  The  first  time  through  the  loop  the  AQ 
must  be  shifted  zero  bits  so  the  address  field  of  this  word 
contains  zero.  After  every  indirect  reference  the  address 
field  will  be  incremented  by  the  delta  field,  here  nine. 
Thus  the  rotate  counts  will  be  0,  9,  18...  63.  In  addition 
this  word  is  used  to  control  the  number  of  times  the  loop 
will  execute.  After  an  add-delta  reference  is  made  the 
tally  field  of  the  word  is  decremented  by  one;  if  it  reaches 
zero  the  tally  runout  indicator  is  set.  This  tally  field  is 
set  to  eight  before  beginning  the  loop.  Thus  the  loop  will 
iterate  eight  times,  due  to  the  transfer-tally-runout-flag 
off  instruction  at  the  end. 

After  preparing  the  convolution  registers,  the 


An  Enciphering  Module  for  Multics 


page  50 


confusion  operation  is  performed  on  the  source  registers. 
This  is  done  by  loading  the  source  registers  into  the  AQ  and 
shifting  right  one  bit  position.  Now  each  8-bit  byte 
appears  right  justified  in  each  Multics  9-bit  byte  of  the 
AQ.  The  AQ  is  now  ORed  with  some  doubleword  of 
exploded_key .  Each  bit  of  exploded_key  occupies  the  high 
order  bit  of  a  9-bit  byte;  thus  each  bit  to  be  used  for 
transformation  control  now  resides  to  the  left  of  the 
corresponding  byte  of  the  source. 

The  doubleword  of  exploded_key  to  use  for 
transformation  control  is  equal  to  the  byte  of  the  key 
addressed  by  interruption_row.  This  is  because  each  byte  of 
the  key  uses  a  doubleword  of  explodcd_key ,  and  because 
interruption_row  (in  x2)  always  addresses  the  first  byte  of 
the  key  to  use  for  interruption  this  CID  cycle  which  is  also 
the  byte  to  use  for  transformation  control.  Sinc<->  even  the 
doubleword  instructions  address  in  word  indexes, 
interruption_row  must  be  doubled.  This  is  done  by  adding  it 
in  twice,  once  in  the  epplb  instruction  and  once  in  the  ornq 
instruction  itself. 

Thu  AQ  is  stored  and  translated  by  the  mvt  instruction. 
The  confusion  table  used  here  is  identical  to  the  one  in  the 
PL/I  program,  except  that  each  8-bit  result  byte  is  as  usual 
left  justified  within  a  9-bit  byte. 

These  confused  bytes  are  now  interrupted  by 
exclusive-oring  with  the  eight  bytes  of  the  key  addressed  by 
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interruption_row .  Diffusion  is  obtained  by  exclusive- oring 
with  the  prerotated  convolution  registers  stored  in 
"normalized'' . 

The  interchange  operation  must,  as  well  as  swapping  the 
source  and  convolution  (now  stored  in  "normalized") , 
unrotate  the  convolution  registers  to  undo  the  effect  of 
lining  up  the  exclusive-or  qates  described  above.  This  is 
done  via  a  very  similar  loop  to  rotate_loop.  A 
subtract-delta  modifier  references  through  and  masks  ptr. 
Since  this  modifier  subtracts  delta  before  indirectinq  the 
masks  will  be  used  in  the  reverse  order.  The  shift  counts 
needed  are  shown  below;  the  add-delta  word  for  shifting 
aqain  supplies  loop  control. 

Table  3:  Convolution  Register  Rotation  Counts 
Row  Previous  Rotation  Post-Rotation 
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The  reqistor  accesses  and  rotate  counts  for  the  pre rotating 
should  be  read  down;  for  postro^at ion  the  table  should  be 
rend  up. 
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After  sixteen  CID-interchange  pairs,  one  more 
interchange  has  been  done  than  desired.  This  is  undone  by 
swapping  the  two  registers.  The  bytes  are  now  packed  into 
the  result  field. 

Some  possibilities  still  exist  for  speedmn  up  this 
program.  The  two  loops  controlled  by  tally  words  onlv  loop 
eight  times;  they  could  be  exploded  into  eight  copies. 
Since  the  address  of  and_masks  and  the  rotate  counts  would 
in  each  copy  be  known  at  compile  time  no  indirect  words 
would  be  needed.  In  addition  the  loop  control  instruction 
ttf  would  be  eliminated.  Counting  ttf  as  two  memory 
accesses  and  each  of  the  tally  references  as  one,  four 
memory  accesses  could  be  saved  each  rotation.  Since  eight 
are  required  in  t  x>p,  and  there  are  two  loops,  64  memory 
accesses  would  be  saved.  Eight  more  would  be  saved  by 
eliminating  the  tally  word  setup  instructions  at  the 
beginning  of  each  loop,  for  a  total  of  72.  Since  there  are 
sixteen  CID  cycles  a  total  of  72  times  16  =  1152  memory 
cycles  might  be  saved.  This  may  total  as  much  as  a 
millisecond,  thus  savinq  about  twenty  percent  of  the  cipher 
time  for  a  given  block.  This  demonstrates  how  sensitive  a 
program’s  performance  can  be  to  minor  changes  in  coding 
style.  Other  experiments  are  suggested,  such  as  completely 
rewriting  the  program  with  all  arrays  transposed  (so  that 
the  bits  of  a  byte  arc  not  stored  sequentially) ,  or 
eliminating  the  padding  bit  on  each  byte. 
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APPENDIX  D  -  INTRODUCTION  TO  MULTICS  ASSEMBLER 

This  section  is  intended  to  be  a  quick  introduction  to 
tiie  Honeywell  model  G180  processor  for  those  who  are 
unfamiliar  with  its  machine  language. 

The  6180  is  a  word-addressed  machine  with  a  36-bit 
word;  it  also  possesses  some  very  powerful  bit  string  and 
character  string  handling  instructions.  There  are  two  major 
arithmetic  registers  of  36  bits  each,  the  accumulator  (A) 
and  the  quotient  (Q)  registers.  These  may  be  coupled  to 
form  a  double  length  register,  the  AQ.  Instructions  ending 
in  A,  Q,  or  AQ  operate  on  the  corresponding  registers. 

There  are  in  addition  eight  index  registers  of  eiahteen 
bits  each.  Instructions  ending  in  xN  where  N  is  an  octal 
digit  operate  on  these  registers.  Most  index  register 
instructions  take  a  storage  operand  in  the  top  half  of  a 
word,  except  for  sxlN  (store  xN  in  lower  half)  and  lxlN 
(load  index  N  from  lower  half) . 

There  exist  eight  pointer  registers  for  generatino 
segment  number  -  word  number  pairs.  These  registers  contain 
a  character  offset  and  a  bit  offset  from  the  addressed  word 
for  the  use  of  character  string  and  bit  string  instructions. 
The  names  of  these  registers  (in  numeric  address  order)  are 
ap,  ab,  bp,  bb,  lp,  lb,  sp  and  sb.  The  ap  points  to  a 
procedure’s  argument  list.  The  lp  points  to  the  procedure's 
linkage  section  where  internal  static  variaoles  are  kept. 
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such  as  the  key.  The  sp  points  at  the  stack  frame/  in  which 
automatic  variables  are  kept.  Variables  declared  in  a 
"temp"  or  "tempd"  pseudoop  are  placed  in  the  stack  frame  by 
the  assembler  and  are  given  one  or  two  words  each 
respectively.  A  temp  variable  may  also  be  qiven  a  subscript 
in  which  case  it  will  be  assiqned  that  many  words. 
Declaration  in  a  temp  or  tempd  implies  an  sp  reference.  The 
other  pointer  registers  are  used  for  spare  registers;  for 
example,  the  bp  points  at  the  input  string  and  the  bb  points 
at  the  output  string. 

A  sample  instruction  would  be 
ldq  lp  |foo 

This  instruction  will  load  the  Q  register  with  the  internal 
static  (because  of  the  lp  reference)  variable  foo. 
adq  15*8, dl 

will  add  120  to  the  Q  register.  The  dl  address  modifier 
causes  the  address  field  to  act  like  a  memory  operand, 
padded  on  the  left  with  zeroes.  The  du  modifier  pads  on  the 
right  with  zeroes. 

The  following  strange- looking  multiv/ord  instructions 
are  the  special  character  string  and  bit  string 
instructions;  this  one  performs  boolean  operations  on  bit 
strings.  Here  a  simple  move  is  indicated. 

csl  (pr,ql),(pr,al),fill(0) ,bool (move) 

descb  bp | 0 , 8 

descb  convolution , 9 
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will  move  eight  bits  from  the  address  bp|0+nl  to  a  9-bit 
field  (padding  with  a  zero  bit)  at  convolution  (plus 
implicit  sp  reference)  +  al.  The  offset  modifiers  ql  and  al 
refer  to  the  bottom  of  the  Q  and  A. 


mvt 

(pr) , (pr) 

desc9a 

confused  bytes, 8 

desc9a 

confused  bytes, 8 

arg 

conf usion_table+3-*  ,ic 

will  translate,  the  eight  9-bit  bytes  at  confused_bytes 
(first  argument)  according  to  the  table  at  conf usion_table 
( third  argument)  and  deposit  the  resultant  eight  9-bit  bytes 
in  conf used_bytes  (second  argument) .  The  lookup  is  done  bv 
treating  each  character  as  an  index  into  the  table. 

A  list  of  most  of  the  instructions  used  in  Lucifer  and 
their  meaning  follows. 

ada,  q,  xN 
an?  q,  xN 
anaq 
arg 

empa ,  q ,  xN 
csl 


add  to  A,  Q,  xN 

and  to  A,  Q,  xN 

and  to  AQ  (two  words) 

zero  opcode  (used  for  mvt  table  and 

constants) 

compare  A,  Q,  xN 

combine  bit  strings  left  (three 

word  instruction) 

a  pseudoop  which  generates  a  bit 

string  descriptor  for  a  csl 


descb 
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desc9a 
eaa,  xN 
eppN 

era,  q,  aq,  xN 

ersa,  ersq 

Ida,  q,  aq 

Hr 

11s 

lrl 

lxlN 

mlr 

im  t 

ora,  q,  aq 
orsa,  q 
qls 

sba,  q,  xN 

s  ta ,  q ,  aq 

stxN 

stz 

trai 

tnz 

tpl 


instruction. 

generates  a  9-bit  character  descriptor 
effective  address  to  A  (top  half)  ,  xN 
effective  pointer  to  pointer 
register  N 

exclusive  or  A,  Q,  AQ,  xIJ 

exclusive  or  A,  Q  to  storaqe 

load  A,  Q,  AQ 

long  (AQ)  left  rotate 

long  (AQ)  left  shift 

long  (AQ)  right  logical  shift 

load  xN  from  lower  half 

move  character  string  left  to  righ^ 

(three  word  instruction) 

move  with  translation 

(four  word  instruction) 

OR  A,  Q,  AQ 

OR  A,  Q  to  storage 

Q  left  shift 

subtract  A,  Q,  xN 

store  A,  Q,  AQ 

store  xN 

store  zero 

transfer  on  minus 

transfer  on  not  zero 

transfer  on  plus  (including  zero) 
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tra  unconditional  transfer 

ttf  transfer  tally-runout  flag  off 


Address  modifiers  appear  after  a  comma  in  an  address 
field.  For  example 

ldq  bp|0,x2 

causes  indexing  by  x2. 

xN  index  by  index  register  N 

*  indirect 

*xN  or  *N  indirect  then  index  (i.e.,  add 

index  register  to  address  in 
indirect  word) . 

xN*  or  N*  index  then  indirect 

As  well  as  xN  index  modification,  the  following  can  be 
used  whenever  xN  appears  above: 


au 

top  of  A 

al 

bottom  of  A 

qu 

top  of  Q 

qi 

bottom  of  Q 

ic 

instruction  counter 

du 

direct  to  upper 

dl 

direct  to  lower 
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The  indirect  and  tally  modifiers  add-delta  (AD)  and 
subtract-delta  (SD)  take  an  indirect  word.  Add-delta 
causes,  after  the  instruction  is  executed  on  the  operand 
pointed  to  by  the  address  field  (bits  0  -  17;  the  operand 
lies  in  the  same  segment  as  the  AD  word) ,  the  delta 
(rightmost  six  bits)  to  be  added  to  the  address  field.  The 
tally  (bits  18  to  29)  is  decremented  by  one.  If  the  tally 
reaches  zero  the  tally-runout  indicator  is  set,  but  no  fault 
occurs.  Subtract-delta,  before  executing  the  instruction, 
subtracts  the  delta  from  the  address  field  and  increments 
the  tally  by  one. 
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